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Abstract.  There  have  long  been  threads  of  investigation  into  covert 
channels,  and  threads  of  investigation  into  anonymity,  but  these  two 
closely  related  areas  of  information  hiding  have  not  been  directly  as¬ 
sociated.  This  paper  represents  an  initial  inquiry  into  the  relationship 
between  covert  channel  capacity  and  anonymity,  and  poses  more  ques¬ 
tions  than  it  answers.  Even  this  preliminary  work  has  proven  difficult, 
but  in  this  investigation  lies  the  hope  of  a  deeper  understanding  of  the 
nature  of  both  areas.  MIXes  have  been  used  for  anonymity,  where  the 
concern  is  shielding  the  identity  of  the  sender  or  the  receiver  of  a  mes¬ 
sage,  or  both.  Traffic  analysis  prevention  (TAP)  methods  are  used  to 
conceal  larger  traffic  patterns.  Here,  we  are  concerned  with  how  much 
information  a  sender  to  a  MIX  can  leak  to  an  eavesdropping  outsider, 
despite  the  concealment  efforts  of  MIXes  acting  as  firewalls. 


Introduction 

Traffic  analysis  in  network  communication  can  be  used  to  open  a  covert 
channel  from  Alice  to  Eve  [12,13,23-25].  In  this  paper  we  discuss  a  particular 
covert  channel  that  exists  in  an  anonymizing  network.  We  present  some  simplified 
scenarios  as  a  first  step  in  this  analysis. 

*  Research  supported  by  the  Office  of  Naval  Research. 
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There  is  always  one  special  transmitting  node  in  a  network  called  Alice.  Alice 
and  possibly  other  transmitters  have  legitimate  business  transmitting  messages 
to  a  set  of  Receivers  {Ri\i  =  1,2 These  transmitters  act  completely 
independently  of  one  another,  and  have  no  direct  knowledge  of  each  other’s 
recent  transmission  behavior.  Alice  may  have  some  general  knowledge  of  the 
long-term  traffic  levels  produced  by  the  other  transmitters,  e.g.,  the  number 
of  other  transmitters  and  their  probabilistic  behavior,  which  can  allow  Alice  to 
write  a  code  that  can  improve  the  covert  communication  channel’s  data  rate.  She 
cannot,  however,  perform  short-term  adaptation  to  their  behavior.  Our  simplified 
communication  is  one-way  (the  receivers  never  send  to  Alice  or  to  the  other 
transmitters).  We  also  assume  that  there  is  a  clock,  and  that  transmissions  only 
occur  in  the  unit  interval  of  time  called  a  tick .  Any  subset  of  transmitters  can 
each  either  send  a  single  message  to  a  single  receiver  in  a  tick,  or  not  send  a 
message  at  all.  Each  transmitter  in  a  tick  can  send  to  a  different  receiver,  and 
two  or  more  transmitters  may  send  to  the  same  receiver  in  the  same  tick.  All 
messages’  contents  are  encrypted  end-to-end. 

There  is  also  an  eavesdropper  on  the  network  called  Eve.  Since  all  transmis¬ 
sions  are  encrypted,  they  appear  to  the  eavesdropper  Eve  as  having  indistin¬ 
guishable  content.  Eve  may  be  either  a  global  passive  adversary  (GPA),  with 
the  ability  to  see  link  traffic  on  every  link  in  the  network,  or  a  restricted  passive 
adversary  (RPA),  with  the  ability  to  observe  traffic  only  on  certain  links. 

Alice  is  not  allowed  any  direct  communication  with  Eve.  However,  Alice  can 
influence  what  Eve  sees  on  the  network.  We  present  several  different  scenarios 
and  analyze  the  subtle  ways  by  which  Alice  may  indirectly  communicate  with 
Eve.  In  particular,  we  study  network  scenarios  that  attempt  to  achieve  a  degree 
of  anonymity  with  respect  to  the  network  communication.  That  is,  the  networks 
are  designed  with  various  anonymity  devices  to  prevent  Eve  from  learning  who 
is  sending  a  message  to  whom.  Even  if  a  certain  degree  of  anonymity  is  achieved, 
it  still  may  be  possible  for  Alice  to  communicate  covertly  with  Eve.  Please  keep 
in  mind  that  anonymous  communication  networks  were  not  designed  with  this 
covert  channel  threat  in  mind.  Rather,  it  was  our  study  of  these  anonymity 
networks  that  caused  us  to  realize  that  even  in  what  appears  to  be  a  benign 
form  of  communication,  information  may  still  leak  out  of  the  network,  contrary 
to  the  intent  of  system  design. 

The  main  thrust  of  this  paper  is  to  analyze  the  situation  where  there  are 
two  enclaves,  communication  between  them  is  encrypted,  and  packets  are  sent 
only  from  the  first  enclave  (which  contains  Alice)  to  the  second  (please  refer  to 
Figure  1).  Eve  is  able  to  monitor  the  communication  from  the  first  enclave  to 
the  second.  Anonymity  is  “achieved”  in  that  an  eavesdropper  such  as  Eve  (as 
RPA)  does  not  know  who  is  sending  a  message  (that  is  hidden  inside  of  the  first 
enclave)  and  nor  who  is  receiving  the  message  (this  can  only  be  known  if  one  is 
interior  to  the  second  enclave).  Eve  is  only  allowed  to  know  how  many  messages 
per  tick  travel  from  the  first  enclave  to  the  second.  Nonetheless,  Alice  attempts 
to  communicate  covertly  with  Eve. 
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Fig.  1.  Restricted  Passive  Adversary  Model. 


This  paper  analyzes  the  covert  communication  channel  from  Alice  to  Eve.  We 
show  that  even  if  anonymity  is  taken  into  consideration  with  respect  to  system 
design,  covert  channels  may  remain.  As  a  baseline,  we  first  consider  situations  in 
which  no  attempt  at  anonymity  has  been  made  (only  encryption  of  the  messages, 
so  that  they  all  appear  to  be  identical  to  an  eavesdropper).  Later,  we  will  consider 
covert  channel  capacity  in  networks  with  the  stronger  anonymity  controls  just 
described.  This  paper  concludes  with  a  summary  and  some  directions  for  future 
research. 

1  Base  Scenario  —  No  anonymity 

One  transmitter 

Alice  is  the  only  transmitter,  and  there  are  M  possible  receivers.  Eve  has 
knowledge  of  the  network  traffic  (Eve  is  a  GPA  —  see  Figure  2).  The  only 
properties  that  Eve  can  discern  from  a  message  is  its  source  (trivially  Alice)  and 
its  destination.  Alice  can  use  that  fact  to  send  information  covertly  to  Eve.  In 
this  simplistic  scenario  Eve  can  see  if  Alice  is  sending  a  message,  and  if  Alice  is 
sending  a  message  Eve  can  determine  for  which  receiver  the  message  is  meant. 
This  gives  Alice  the  ability  to  signal  Eve  with  an  alphabet  of  M  +  1  symbols: 
M  symbols  for  the  M  different  receivers,  and  one  symbol  (“0”)  for  the  choice  of 
not  sending  a  message. 

Since  nothing  is  able  to  interfere  with  Alice’s  transmission,  we  have  a  noise¬ 
less  discrete  memoryless  channel  (DMC)  modeling  the  covert  channel,  whose 
capacity  is  log(M  +  1)  bits  per  tick.1 

Several  transmitters 

Now,  if  there  are  other  transmitters  aside  from  Alice,  but  their  transmissions 
to  any  of  the  M  receivers  do  not  affect  Alice’s  transmissions,  then  the  covert 
channel  from  Alice  to  Eve  is  as  above.  This  would  be  the  case  if  the  links  into 
a  receiver  can  handle  all  of  the  traffic  meant  for  them.  Of  course,  if  the  link 

1  All  logarithms  are  base  2,  and  we  will  also  adopt  the  convenience  of  no  longer  stating 
the  units  of  the  capacity.  The  units  will  be  understood  to  be  bits  per  tick. 


3 


Eve 


Fig.  2.  Global  Passive  Adversary  Model. 


capacity  into  a  transmitter  does  affect  the  number  of  receivable  transmissions 
then  that  introduces  noise  into  the  channel  and  the  capacity  is  obviously  less 
than  log (M  -f  1).  This  is  a  course  of  research  worth  pursuit. 

Anonymity  discussion 

In  the  above  scenario  Alice  can  obviously  leak  considerable  information  to  Eve. 
This  is  no  secret  to  the  anonymity  community,  e.g .,  [1-4,14,15,18,6,20]  (while 
the  preceding  list  is  only  a  representative  sample  of  papers/URLs  on  the  topic, 
these  papers  relate  particularly  well  to  what  we  discuss  in  this  paper).  However, 
in  the  past  the  concerns  have  focused  on  retaining  or  regaining  anonymity.  It 
is  the  “anonymity  lost”  that  we  exploit  for  covert  communication.  If  there  were 
K perfect V  anonymity,2  then  we  would  not  expect  to  find  a  covert  channel. 

To  provide  anonymity,  transmissions  from  a  transmitter  are  often  first  sent 
to  an  intermediary,  such  as  a  MIX  [4]  or  an  onion  router  [14],  before  they  are 
forwarded  to  the  receiver.  This  has  the  effect  of  hiding  whither  the  message  is 
going.  Thus,  these  intermediaries  serve  to  anonymize  the  transmission.  Of  course, 
Eve  still  knows  the  set  of  those  who  receive  a  message,  and  she  also  knows  the 
set  of  those  who  sent  a  message,  but  she  does  not  know  who  sent  a  message 

We  intentionally  leave  the  notion  of  perfect  anonymity  as  fuzzy  in  this  paper.  We  ask 
the  reader  though  the  somewhat  circular  question:  If  we  did  have  perfect  anonymity, 
how  could  we  have  covert  communication? 
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to  whom.  It  is  interesting  that,  even  when  we  seem  to  have  “good”  statistical 
anonymity,  Alice  may  still  non-trivially  be  able  to  communicate  covertly  with 
Eve. 

The  use  of  a  MIX  alone  does  not  prevent  Alice  from  covert  communication 
with  Eve.  In  fact  there  are  two  possible  situations. 

1.  Alice  signals  Eve  by  sending  or  not  sending  a  message.  A  MIX  alone  does 
nothing  to  prevent  Eve  from  learning  this  information  (this  is  not  what  a 
MIX  is  designed  to  do).  We  discuss  this  further  at  the  beginning  of  the  next 
section.  Therefore  Alice  has  a  noiseless  channel  to  Eve,  with  a  capacity  of 
one. 

2.  Alice  signals  Eve  by  sending  a  message  to  any  one  of  M  different  receivers. 
If  Alice  is  the  only  transmitter,  Eve  simply  sees  where  messages  are  going 
when  they  leave  the  MIX  (a  concern  well-known  to  MIX  designers).  This 
allows  a  covert  channel  with  a  capacity  of  log(M  +  1).  If  there  are  other 
users,  their  behavior  affects  what  Eve  is  receiving  and  the  capacity  is  then 
less  than  log(M  -F  1). 

We  will  not  study  the  latter  situation  in  this  paper,  because  we  do  not  use 
pure  MIXes.  Instead,  we  use  MIXes  acting  as  firewalls. 

2  Scenario  2:  Indistinguishable  Receivers — Two 
MIX-firewalls 

Consider  the  situation  in  which  every  message  goes  into  the  anonymizing  inter¬ 
mediary  referred  to  as  a  MIX  [4].  The  MIX  has  the  effect  of  hiding  the  “linking” 
knowledge  of  which  transmission  is  sent  to  which  receiver.  In  other  words,  Eve 
knows  who  is  transmitting  and  who  is  receiving,  but  in  general,  Eve  does  not 
know  which  transmitter  is  sending  to  which  receiver.  This  assumes  that  Eve  is  a 
GPA.  Of  course,  if  only  one  transmitter  is  operating  then  the  MIX  hides  nothing. 
In  other  words  the  MIX  gives  statistical  anonymity.  The  amount  of  anonymity 
has  been  measured  as  the  log  of  the  number  of  transmitters  ( anonymity  set  size), 
sometimes  in  conjunction  with  probabilistic  behavior  (e.g.,  [2-4,6,20]). 

The  main  concern  of  this  paper  is  not  with  measuring  anonymity,  rather 
it  is  the  amount  of  covert  information  that  may  be  leaked  through  less  than 
perfect  anonymity.  However,  we  do  note  the  very  important  observation  from  our 
research:  the  ability  to  covertly  communicate  arises  due  to  a  lack  of  anonymity . 
As  the  number  of  transmitters  goes  up  and  as  the  transmitters  behave  in  a 
“uniform  (equi-probabilistic)  manner,”  the  anonymity  increases  and  we  will  show 
that  the  covert  channel  capacity  diminishes. 

For  Scenario  2  we  assume  that  there  are  transmitters  Alice  and  Clueless*,  i  = 
1, ...  ,N.  The  N  Clueless*  transmitters  behave  independently  of  each  other  and 
of  Alice,  and  they  all  have  the  same  time-invariant  probabilistic  behavior.  Alice 
and  the  Clueless*  are  hidden  from  Eve.  They  submit  their  messages  to  a  MIX 
that  also  functions  as  a  firewall.  This  first  MIX-firewall  acts  as  an  exit  point. 
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This  MIX-firewall  sends  its  encrypted  messages  to  a  second  MIX-firewall  that 
is  an  entrance  to  a  second  hidden  (from  Eve)  enclave.  We  further  assume  that 
Eve  is  a  GPA  only  between  the  two  MIX-fire walls,  t.e.,  an  RPA.  That  is,  Eve 
only  has  knowledge  of  how  many  messages  come  out  of  the  first  MIX-firewall 
per  tick,  and  Eve  does  not  know  to  whom  the  messages  are  going.  The  situation 
is  described  by  the  following  diagram  (Figure  3). 


Alice,  Clueless, 


•  Receivers 


Fig.  3.  MIX-firewalls  with  Restricted  Passive  Adversary. 


This  situation  is  realistic3  if  the  MIXes  are  acting  as  (first)  firewall  exit 
and  (second)  entrance  points,  or  if  the  MIXes  are  onion-type  routers  acting 
as  firewalls.  Therefore,  as  stated  above,  we  assume  throughout  this  scenario 
that  Eve  only  has  knowledge  of  the  number  of  messages  coming  out  of  a  MIX 
acting  as  a  firewall.  Transmitters  are  allowed  at  most  one  transmission  per  tick. 
Alice  attempts  to  signal  Eve  by  transmitting  to  one  of  M  possible  receivers 
(which  receiver  Alice  transmits  to  is  immaterial),  or  by  not  transmitting  at  all. 
However,  Clueless*  is  also  transmitting  without  any  regard  to  what  Alice  is  doing. 
The  transmissions  from  both  Alice  and  Clueless*  go  into  the  first  MIX-firewall, 
which  acts  as  an  exit  point.  Alice  does  not  know  what  Clueless*  is  doing  (this 
assumption  is  made  throughout  the  paper).  Eve  sees  messages  coming  out  of  the 
first  MIX-firewall  on  their  way  to  the  second  MIX-firewall,  but  does  not  know 
who  sent  them,  or  where  they  are  going.  All  messages  go  into  the  second  MIX- 
firewall,  which  sends  them  to  their  receivers.  Every  tick,  Alice  and  each  Clueless* 
either  send  or  do  not  send  one  message  each.  Therefore,  the  only  knowledge 
that  Eve  can  get  by  eavesdropping  is  the  number  of  messages  per  tick  passing 
between  the  two  MIX-firewalls.  In  other  words,  every  tick,  Eve  observes  the 
number  of  packets  leaving  the  MIX-firewall  and  “receives”  some  number  from 
the  set  {0,  +  1}. 

Therefore  the  only  quantity  observable  by  Eve  that  Alice  can  affect,  per  tick, 
is  the  number  of  messages  that  Eve  counts.  This  covert  channel  is  a  discrete 
memoryless  channel  with  noise  since  the  Clueless* *s  randomly  affect  the  output. 

3  Consider  the  case  of  packets  from  one  LAN/enclave  being  sent  to  another 
LAN/enclave  using  IPSEC  tunneling  (8].  In  this  case,  an  eavesdropper  can  only 
count  the  number  of  outgoing  messages  destined  for  the  receiving  enclave.  What 
goes  on  inside  each  LAN/enclave  is  hidden  from  an  eavesdropper.  If  UDP  with  no 
application  level  ACKs  is  employed,  communication  is  only  one-way  [16]. 
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How  does  Eve  regard  the  transmissions?  What  is  the  most  information  that  Alice 
can  send  to  Eve  in  this  manner?  Shannon’s  information  theory  [21]  answers  these 
questions  for  us. 

Let  us  go  back  to  the  base  scenario;  here  we  stated  that  the  capacity  is 
obviously  log(M  -I- 1).  How  do  we  know  that  some  other  exploitation  of  the  base 
scenario  will  not  give  us  a  higher  capacity?  The  reason  is  that  there  are  at  most 
M  - hi  symbols  in  whatever  exploitation  we  use,  and  if  the  channel  is  noiseless 
we  have  maximized  the  capacity  (this  is  related  to  the  maximum  entropy  as 
discussed  in  [11].)  For  Scenario  2  capacity  cannot  be  explained  so  easily  and  is 
the  major  study  of  this  paper. 

Keep  in  mind  that  for  Scenario  2  it  does  not  matter  if  there  is  one  receiver 
or  there  are  one  hundred  and  one  receivers.  Eve  can  only  count,  and  Alice  or 
Clueless*  can  only  send  one  message  per  tick.  Therefore  the  number  or  receivers 
does  not  matter.  It  is  only  important  that  there  is  at  least  one  receiver. 

We  break  Scenario  2  down  into  four  cases:  2.0,  2.1,  2.2,  and  2.3.  Case  2.3  is 
the  general  form  of  Scenario  2  and  the  first  three  are  simplified  special  cases. 


2.1  Two  special  cases  of  Scenario  2:  —  Alice  alone,  and  with  and 
one  additional  transmitter 

Case  2.0  —  Alice 

This  is  the  case  where  N  —  0.  Alice  is  the  only  transmitter.  Alice  sends  either  0 
(by  not  sending  a  message)  or  0C  (by  sending  a  message  —  it  does  not  matter  to 
which  receiver  Alice  sends  the  message  since  that  is  indistinguishable  to  Eve). 
Eve  receives  either  eo  =  0  (Alice  did  nothing)  or  e\  =  1  (Alice  sent  a  message  to 
a  receiver).  There  is  no  noise  in  this  channel.  The  capacity  of  this  covert  channel 
is  1. 

We  develop  the  necessary  information  theory  further  on  in  the  paper.  How¬ 
ever,  we  state  the  capacity  is  the  maximum,  over  the  probability  x  for  Alice 
inputting  a  0,  of  the  mutual  information  7(E,  A).  A  is  the  distribution  for  Alice 
described  by  x,  and  E  is  the  distribution  for  Eve.  Since  there  is  no  noise,  I  is 
simply  the  entropy  H(E)  describing  Eve. 

I(E,  A)  =  H(E)  =  -xlogx  -  (1  -  x)log(l  -  x), 

which  is  maximized  to  1  when  x  =  .5. 

Case  2.1  —  Alice  and  one  additional  transmitter  (Clueless) 

In  this  case  N  —  1.  Therefore,  Eve  receives: 

—  0  if  neither  Alice  nor  Clueless  transmit; 

—  1  if  Alice  does  not  transmit  and  Clueless  does  transmit,  or  Clueless  transmits 
and  Alice  does  not;  or 

—  2  if  both  Alice  and  Clueless  transmit. 

In  the  remainder  of  subsection  2.1  we  develop  the  information  theory  to  analyze 
the  covert  channel  for  Case  2.1. 
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Let  us  model  the  communications  channel  as  follows:  A  is  the  input  random 
variable  describing  Alice,  and  E  is  the  output  random  variable  describing  Eve. 
Clueless  contributes  to  the  noise,  but  is  not  modeled  as  an  input.  Alice  commu¬ 
nicates  with  Eve  via  the  covert  channel.  The  input  symbols  for  the  channel  are 
0,  which  signifies  that  Alice  is  not  transmitting  a  message  to  any  receiver,  and 
0C,  which  signifies  that  Alice  is  transmitting  a  message  to  some  receiver.4 


A 


anonymizing  network 


(a)  Channel  block  diagram 


E 


0 


1 


Fig.  4.  Channel  model  for  Case  2.1 


At  this  point  we  caution  the  reader  not  to  confuse  Alice  transmitting  a  message  to 
a  receiver  R\ ,  and  Alice  communicating  to  Eve  via  the  covert  channel.  Eve  is  not 
the  receiver  Ri  in  the  sense  of  Alice  or  Clueless  transmitting  a  message.  Eve  receives 
symbols  via  the  covert  channel  from  Alice.  There  are  two  different  communication 
paths  that  must  be  kept  separate.  One  is  the  legitimate  network  communication 
that  the  anonymizing  device  attempts  to  keep  unknown.  The  other  is  the  covert 
communication  that  Alice  has  to  Eve.  A  way  to  stop  the  covert  communication  would 
be  for  the  anonymizing  device  to  pad  [11-13, 23, 24]  messages  so  that  it  would  appear 
to  Eve  that  both  Alice  and  Clueless  are  transmitting  a  message.  This  inefficiency 
might  be  tolerated  in  such  an  ideal  situation  as  Case  2.1,  but  such  a  strategy  must  be 
called  into  question  when  it  comes  to  real  traffic.  In  Case  2.1  the  anonymizing  effect 
is  done  by  a  MIX-firewall,  which  does  not  a  priori  pad.  Of  course,  before  advocating 
traffic  padding  one  should  be  fully  aware  of  the  threat  that  the  padding  is  intended 
to  stop.  Failure  to  understand  the  threat  first  is  inadvisable  since  padding  comes  at 
the  pragmatic  costs  of  efficiency  and  proper  network  resource  utilization. 
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Figure  4  shows  two  ways  to  look  at  the  channel.  The  top  part  (a)  of  the  figure 
is  the  simple  schematic.  A  is  the  input,  E  is  the  output,  and  the  anonymizing 
network  (the  two  MIX-firewalls  between  the  transmitters  and  receivers)  adds 
noise.  The  bottom  part  (b)  of  Figure  4  shows  that  the  inputs  symbols  are:  0, 
which  represents  A  not  sending  a  message;  and  0C,  corresponding  to  A  actually 
sending  a  message  to  one  of  the  M  possible  receivers.  The  output  symbols  corre¬ 
spond  to  the  three  states  E  might  perceive.  The  output  symbol  0  corresponds  to 
no  one  sending  a  message;  the  output  symbol  1  corresponds  to  Alice  or  Clueless, 
but  not  both,  sending  a  message;  and  the  output  symbol  2  corresponds  to  both 
Alice  and  Clueless  sending  a  message. 

Let  us  consider  the  channel  matrix. 


0  1  2 


The  2x3  channel  matrix  M2.i[i,j ]  represents  the  conditional  probability  of  Eve 
receiving  the  symbol  j  when  Alice  sends  the  symbol  i, 


M2.i[i,  j]  =  P(E  =  j\A  =  i). 


We  will  show  that  p  =  a,  and  thus  it  trivially  follows  that  q  =  /3. 

The  probability  P(  •  |  A  =  i)  is  totally  dependent  upon  what  Clueless 

does  (the  action  of  Alice  is  already  fixed  at  A  =  i,  by  the  fact  that  it  is  a 
conditional  probability).  Let  us  consider  what  happens  when  Clueless  sends  a 
message,  and  assign  a  probability  !-£  to  Clueless  sending  the  message.5  Consider 
P(E  =  0  |  A  =  0)  and  P(E  =  1  J  A  =  0C).  The  only  way  for  Eve  to  receive  a  0, 
when  Alice  has  not  sent  a  message,  is  for  Clueless  not  to  have  sent  a  message. 
Therefore,  P(E  =  0  |  A  =  0)  =  C*  The  only  way  for  Eve  to  receive  a  1,  when 
Alice  has  sent  a  message,  is  for  Clueless  not  to  have  sent  a  message.  Therefore, 
we  also  have  P(E  =  1  |  A  =  1)  =  C*  Thus  p-  (  =  a,  and  q  -  f}  =  1  -  p.  So  our 
channel  matrix  simplifies  to: 

0  1  2 

0  (p  q  0\ 

0C  ^0  p  q)' 

We  wish  to  determine  the  channel  capacity  of  the  above  discrete  memory¬ 
less  channel.  We  let  the  probability  that  Alice  sends  a  0  be  P(A  =  0)  =  x, 

5  We  will  assume  from  now  on  that  such  a  distribution  can  be  assigned,  and  further 
that  the  distribution  is  stationary  (it  is  the  same  each  tick).  Without  such  an  as¬ 
sumption  we  can  still  study  the  problem,  but  if  the  distribution  is  non-stat ionary  the 
analysis  becomes  much  more  difficult  since  the  channel  is  no  longer  memoryless.  We 
do  not  feel  though  that  assigning  Clueless  a  stationary  distribution  is  that  onerous. 
The  distribution  could  be  assigned  via  statistical  analysis  of  past  behavior  (to  make 
this  valid  one  should  assume  that  Clueless  is  not  adapting  to  Alice’s  behavior).  If 
one  cannot  assign  a  random  variable  to  Clueless  then  our  analysis  is  erroneous. 
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and  therefore  P(A  —  0C)  =  1  —  x.  The  term  x  is  the  only  term  that  can  be 
varied  to  achieve  capacity.  Here  is  where  Alice  may  use  knowledge  of  long-term 
transmission  characteristics  of  the  other  transmitters,  as  well  as  how  many  other 
transmitters  there  are,  to  change  her  (long-term)  behavior.  As  with  other  studies 
of  covert  channels  [10]  we  are  not  concerned  with  source  coding/decoding  issues 
[21].  Our  concern  is  the  limits  on  how  well  a  transmitter  can  “optimize”  its  bit 
rate  to  a  receiver,  given  that  a  channel  is  noisy.  Given  a  discrete  random  variable 
AT,  taking  on  the  values  x*,  i  =  1, . . .  ,nx,  the  entropy  of  AT  is: 

T»X 

H(x)  =  ~  X^P(*>)l°gP(zi)  • 

t=l 

We  use  p(xi)  as  a  shorthand  notation  for  P(X  =  x*).  Given  two  such  discrete 
random  variables  X  and  Y  we  define  the  conditional  entropy  (equivocation)  to 
be: 

tty  «x 

H(X\Y)  =  -^p(2/,)^p(ij|?/,)logp(xj|?/i) . 

X=1  j=l 

Given  two  such  random  variables  we  define  the  mutual  information  between 
them  to  be: 

I(X,Y)  =  H(X)-H(X\Y). 

Note  that  H(X)-H(X\Y)  =  H{Y)-H(Y\X),  so  we  see  that  I{X,  Y)  =  I(Y,  X). 

For  a  DMC  whose  transmitter  random  variable  is  X,  and  whose  receiver 
random  variable  is  Y,  we  define  the  channel  capacity  [21]  to  be: 

C  =  maxI(X,Y), 

where  the  maximization  is  over  all  possible  distributions  for  X  (that  is,  the  p(x,) 
are  all  non-negative  and  sum  to  one). 

In  this  situation  p(a0)  =  P{A  =  0)  =  x,  and  p(ai)  =  P(A  =  0C)  =  1  -  x. 
Since  varying  x  is  varying  all  values  of  the  input  probabilities,  the  capacity  of 
the  covert  channel  between  Alice  and  Eve  is 

ma x{H(E)  -  H(E\A)}. 

X 

H(E\A)  can  be  trivially  determined  from  the  channel  matrix.  To  calculate 
H (E)  we  first  must  determine  the  distribution  for  E,  which  can  be  determined 
from  the  conditional  probabilities  and  the  distribution  for  A.  We  see  that: 


p(e0)  =  P(E  =  0) 

=  P{E  =  0|A  =  0  )P(A  =  0)  +  P(E  =  0|A  =  0  C)P(A  =  0C) 
=  px  +  0(1  -  x)  —  px, 
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p(ei)  =  P(E=  1) 

=  P(E  =  1|  A  =  0 )P(A  =  0)  +  P(E  =  1\A  -  Oc)P(A  =  Oc) 
=  gx+p(l-x), 


and  similarly, 


p(e2)  =  P(E  =  2)  =  q(l-x). 


Therefore, 

#(£)  =  -{pxlogpx  +  [qx  +  p(l  -  x)]  log[gx-hp(l  -  x)]  +  q(l  -x)logg(l  -  x)}. 

Now,  let  us  calculate  the  conditional  entropy 

1  2 

H{E\A)  =  -  y^p(aj)  y]p(eJ|ai)logp(e_)|ai) . 
i=0  j= 0 


This  is: 


-  (P(A  =  0)  { p  logp  +  q  log  q  +  0  log  0}  +  P(A  =  0C)  {0  log  0  +  p  logp  +  q  log  g})  , 
which  simplifies  to 

H(E,A)  =  -  (x{p  logp +  g  log  g}  4-  (1  -  x){plogp  +  glogg})  . 

Thus,  H(E\A)  =  h(p),6  so 

I(E,A)  =  -^pxlogpx+[gx+p(l-x)]log[gx+p(l-x)]+g(l-x)logg(l-x)j-/i(p)  , 


and 


C  =  max  (pxlogpx+[gx4p(l-x)]  log[gx+p(l— x)]+g(l-x)  logg(l-x)^— h(p)|  . 

One  way  to  find  the  maximum  is  to  take  the  first  derivative  of  I{E,A)  with 
respect  to  x,  and  set  it  equal  to  zero.  Since 

^  j-  ^pxlogpx+[gx+p(l-x)]log[gx+p(l-x)]+g(l-x)  logg(l-x)^  -h(p)| 


—  (1  — p)  ln(l  -p)  +pln  x  —  (1  — p)  ln(l  —  x)  +  (1  —  2 p)  ln[(l  —  2p)x+p]| 

(1) 

6  The  notation  h(p)  denotes  the  function  — plogp  —  (1  —  p)log(l  —  p). 
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(noting  that  the  derivative  of  h{p )  is  zero,  and  q  —  1  —  p),  finding  the  zero  of 
-jfcI{E,A)  is  equivalent  to  solving  the  following  equation  for  x. 

p\np -  (1  - p)  ln(l  -  p)  -f  pin  x  -  (1  - p)  ln(l  -  x)  +  (1  -  2 p)  ln[(l  -  2 p)x  +p]  -  0 

Letting  /?  =  gU-riinfi-iO-pinp^  ^hi s  reduces  to  solving 

fi[(  1  -  2 p)x  +  pfv-1  -  xp(l  -  x)p_1  =  0  .  (2) 

When  p  =  1/2,  we  have  that  j3  =  1,  and  we  are  left  with  1  -  x1/2(l  -  x)-1/2  =  0. 
Thus,  when  p  =  1/2,  the  derivative  (1)  is  maximized  when  x  =  1/2.  When 
P  =  0,  /?  =  1,  and  we  are  left  with  x”1  -  (1  -  x)-1  =  0.  Hence,  when  p  -  0, 
the  derivative  (1)  also  is  maximized  when  x  =  1/2.  When  p  =  1,  we  have  that 
/?  =  1,  and  we  are  left  with  (1  -  x)1  -  x  =  0.  Thus,  when  p  =  1,  the  derivative 
(1)  likewise  is  maximized  when  x  =  1/2.  All  of  this  might  suggest  that  x  =  1/2 
always  maximizes  C,  but  this  is  not  the  case  (see  Figure  5). 

Unfortunately,  we  cannot  solve  (2)  in  general.  That  is,  we  are  unable  to  derive 
a  closed  form  expression  for  the  x  value  that  maximizes  the  derivative  (1)  as  a 
function  of  p.  Therefore,  we  numerically  solve7  for  the  zero  of  (1),  and  use  that 
value  to  evaluate  I(Ey  A );  this  gives  us  the  capacity  as  a  function  of  p.  Figure  5 
shows  plots  of  both  the  zero  of  -^I{Ey  A)  as  a  function  of  p,  and  the  capacity 
C(p).8  Note  that  the  zero  of  ^I{Ey  A)  is  the  x  value  that  maximizes  I(E,A). 
That  is  this  choice  of  x  determines  the  probability  distribution  of  A  (as  stated 
earlier  P(A  =  0)  =  x,  and  P(A  —  0C)  =  1  —  x)  that  achieves  capacity  (maximizes 
the  mutual  information). 

We  see  in  Figure  5  certain  symmetries.  The  capacity  graph  is  symmetric 
about  p  =  .5,  and  the  graph  of  the  x  that  achieves  capacity  is  skew-symmetric 
about  p  =  .5  (when  p  =  .5  the  corresponding  x  is  also  .5).  Consider  the  two 
situations  where  p  =  e,  and  where  p  =  1  -  e;  in  both  situations  0  <  e  <  .5. 
Let  xe  be  the  probability  for  the  input  symbol  0  that  achieves  capacity  in  the 

7  At  this  juncture  we  could  have  numerically  determined  the  maximum  of  I{Ey  A). 
We  chose  instead  to  use  Newton’s  method  to  find  the  zero  of  the  derivative  (1). 
We  do  this  because  Newton’s  method  is  a  fast  method,  and  this  way  we  learn  more 
about  the  derivative  (1).  The  mutual  information  function  is  concave  down,  see  [7] 
[Thm.  4.4.2]&[5][Thm.2.7.4],  as  a  function  of  x,  and  since  in  this  paper  the  mutual 
information  is  never  locally  constant  (see  Def.  1  later  on  in  the  paper),  the  maximum 
(p  fixed)  is  achieved  for  one  and  only  one  x  value.  Therefore,  we  can  find  the  capacity 
as  follows.  Evaluate,  for  fixed  p,  the  mutual  information  as  a  function  of  x,  letting  x 
go  from  0  to  1  in  increments  of  .001.  Via  the  concavity  argument  this  will  give  the  x 
value  that  maximizes  the  mutual  information  to  the  nearest  .001.  This  is  the  capacity. 
This  method  and  Newton’s  method  gave  identical  results.  Later  in  the  paper  we  will 
not  differentiate  the  mutual  information  due  to  the  complexity  of  it  and  since  we 
will  not  be  able  to  obtain  closed  form  solutions  for  the  x  value  that  maximizes  the 
mutual  information.  We  will  instead  use  this  simpler  numerical  method. 

8  Holding  p  fixed  we  determine  the  zero  of  the  derivative  (1).  Using  that  zero  we  eval¬ 
uate  I(E ,  A),  using  the  fixed  value  of  p  and  the  associated  zero  of  (1),  to  determine 
the  capacity. 
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Fig.  5.  Plots  of  covert  channel  capacity  as  a  function  of  p ,  and  of  the  x  value  that 
maximizes  the  mutual  information  as  a  function  of  p. 


first  situation,  and  let  x\-t  be  the  probability  that  achieves  capacity  for  the 
second  situation.  For  the  first  situation  we  have  that  1  —  xc  is  the  capacity 
achieving  probability  for  the  output  symbol  0C,  and  similarly  for  the  second 
situation  1  —  x\-t  is  the  capacity  achieving  probability  for  the  output  symbol 
0C.  Physically  the  two  situations  are  “the  same”  if  we  reverse  the  roles  of  the 
outputs  symbols  0  and  2.  Therefore  x€  =  1  -  xi_£.  Writing  xe  as  x€  =  ~  -f  4, 
we  see  that  xi_c  =  \  —  A;  this  is  what  the  lower  dotted  plot  shows  in  Figure  5 
(c  =  1/2  =*4  =  0). 

The  above  discussions  bring  to  light  two  important  observations  that  also 
hold  when  there  are  N  transmitters  in  addition  to  Alice. 

Observation  1  In  conditions  of  very  little  extra  traffic ,  or  very  high  extra  traf¬ 
fic,  the  covert  channel  from  Alice  to  Eve  has  higher  bit  rates. 

Observation  2  The  capacity  C(p),  as  a  function  of  p  is  strictly  bounded  below 
by  C(. 5),  and  C(. 5)  is  achieved  when  the  mutual  information  is  evaluated  at 
x  =  .5  . 

It  is  obvious  that  very  little  extra  traffic  corresponds  to  very  little  noise.  At 
first  glance  though,  it  seems  counterintuitive  that  heavy  traffic  also  corresponds 
to  a  small  amount  of  noise.  This  is  because  the  high  traffic  is  used  as  a  baseline 
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Fig.  6.  Channel  for  Case  2.1  with  e  interference  from  Clueless. 


against  which  to  signal.  This  is  analogous  to  transmission  of  bits  over  a  channel 
where  the  bit  error  rate  (BER)  Pe  is  greater  than  1/2.  In  this  case,  the  capacity 
of  the  channel  is  the  same  as  that  of  a  channel  with  BER  of  1  -  Pe,  by  first 
inverting  all  the  bits.  It  is  the  in-between  situations  that  negatively  affect  the 
signaling  ability  of  Alice.  But,  even  in  the  noisiest  case  (i.e.,  where  p  =  .5)  Alice 
can  still  transmit  with  a  capacity  of  a  half  bit  per  tick. 

Note  that  we  can  never  guaranty  error-free  transmission,  no  matter  how  we 
group  the  output  symbols.  In  fact,  it  is  possible  that  the  outputs  will  always 
be  the  symbol  1  (of  course  the  probability  of  this  quickly  approaches  zero,  as 
the  number  of  transmissions  goes  up).  So  this  covert  channel  has  a  zero- error 
capacity  [22]  of  zero.  Capacity  is  a  useful  measure  of  a  communication  channel 
if  the  assumption  is  that  the  transmitter  can  transmit  a  large  number  of  times. 
With  a  large  number  of  transmissions  an  error-correcting  code  can  be  utilized  so 
as  to  achieve  a  rate  close  to  capacity.  If  the  transmitter  only  transmits  a  small 
number  of  transmissions,  then  using  the  capacity  alone  can  be  misleading. 


2.2  Case  2.2 — Alice  and  two  additional  transmitters  ( N  =  2) 

This  is  similar  to  Case  2.1,  the  difference  being  that  we  have  three  possible 
transmitters,  A  (random  variable  as  before)  for  Alice,  who  is  attempting  to 
communicate  covertly  with  E  (random  variable  as  before)  for  Eve,  and  two  other 
benign  "Clueless”  transmitters  modeled  by  the  random  variables  C\ ,  and  C2 ,  for 
Cluelessi  and  Clueless2,  respectively.  Since  the  MIX-firewalls  only  allow  Eve  to 
count  the  number  of  outgoing  messages,  our  covert  channel  has  four  possible 
output  symbols  (the  inputs  are  as  before  0,  for  Alice  not  sending  a  message,  and 
0C,  if  Alice  does  send  a  message).  The  outputs  are: 

-  0  —  No  one  sends  a  message; 

-  1  —  Alice  sends  a  message,  and  neither  Clueless*  send  a  message;  or,  Alice 
does  not  send  a  message,  and  one,  and  only  one,  Clueless*  sends  a  message; 

~  2  Alice  sends  a  message  and  one,  and  only  one,  Clueless*  sends  a  message; 
or,  Alice  does  not  send  a  message  and  both  Clueless*  send  a  message; 

-  3  —  Alice,  Clueless* ,  and  Clueless2  all  send  a  message. 
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As  stated  earlier  we  assume  that  Cluelessi  and  Clueless2  act  independently  of 
each  other.  Therefore,  if,  as  before,  p  is  the  probability  of  a  clueless  transmitter 
(Cluelessi  or  Clueless2)  not  sending  a  message  into  the  MIX-firewall,  and  q  = 
1  —p  is  the  probability  of  a  clueless  transmitter  sending  a  message,  the  conditional 
probabilities  of  E  given  Alice  sending  0  are: 

-  If  Alice  sends  a  0,  and  Eve  receives  a  0,  then  the  neither  Cluelessi  nor 
Clueless2  sent  a  message;  the  conditional  probability  is  p2. 

-  If  Alice  sends  a  0,  and  Eve  receives  a  1,  then  one,  but  not  both,  of  Cluelessi 
or  Clueless2,  sent  a  message  into  the  MIX;  the  conditional  probability  is  then 
2qpfrom  (Cluelessi  yes,  Clueless2  no),  or  (Cluelessi  no,  Clueless2  yes)  -  they 
are  disjoint. 

-  If  Alice  sends  a  0,  and  Eve  receives  a  2,  then  both  Cluelessi  and  Clueless2 
sent  a  message  into  the  MIX  and  the  conditional  probability  is  q 2 , 

-  If  Alice  sends  a  0,  Eve  never  receives  a  3,  thus  the  conditional  probability  is 

0. 

Similarly  we  can  analyze  the  case  when  Alice  sends  a  0C.  The  covert  channel 
diagram  and  channel  matrix  are  shown  in  Figure  7. 


3 

(a)  Channel  transition  diagram 


M2. 2 


0  1  2 
=  0  (p2  2  qp  q2 

0C  \  0  p2  2  qp 

(b)  Channel  matrix 


3 

0 


) 


Fig.  7.  Channel  for  Case  2.2. 


We  can  easily  observe  that  the  zero-error  capacity  is  zero  because  the  output 
symbols  1  and  2  can  both  be  received  if  0  or  0C  is  transmitted.  Therefore  there 
is  always  some  statistical  error  in  what  is  received.  This  is  similar  to  Case  2.1. 
Now,  what  about  the  simpler  notion  of  capacity?  We  again  represent  the  input 
random  variable  as  A  with  distribution  P(A  =  0)  =  p(«o)  =  x,  and  P(A  =  0C)  = 
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p{ai )  =  1  -  x.  The  output  random  variable  is  E  with  distribution  P(E  =  j)  = 
P(e«))t  =  0, 1,2,3,  and  the  mutual  information  is  I(E,A)  =  H(E)  -  H(E\A). 
So, 


3  1  3 

I{E,  A)  =  -  Y^P(ei) lo6 P(ei)  +  logP(ejla0  • 

J-0  t-0  j= o 

The  p{ej\di)  are  the  i,j  terms  of  the  matrix  Af2.2,  and  p(o0)  =  x,  so  all  we 
need  are  the  p(ej)  terms.  Since 


P(ej)  =  p(ej\a0)p(a0)  +  p(eJ|a1)p(a1) 

=  P(E  =  j\A  =  0)P(A  =  0)  +  P(E  =  j\A  =  0c)P(A  =  0C), 


we  see  that: 


p(e o)  =  p2x,  and 
p(ei)  =  2qpx  +  p2(l  —  x), 
P(e 2)  =  q2x  +  2qp(l  -  x), 
P{e 3)  =  92(1  -  x)  . 


So  we  see  that 


3 

H(E)  =  -  £>(ej)  logp(ey) 
i=o 

=  -  jp2x  logp2x  +  (2qpx  +  p2(  1  -  x))  log  (2gpx  +  p2(l  -  x)) 

+  (tf2*  +  29P(1  ~  a:))  log  (q2x  +  2qp(l  -  x))  +  q2(  1  -  x)  logg2(l  -  x)| 


We  also  have  that 


1  3 

-H(E\A)  -  ^^(ejIaiJlogpCejIai)  =  2[qp  -  h(p )]  . 

i=0  j= 0 

Therefore  ,  we  see  that  the  mutual  information  is 
I (E,  A)  =  -  jp2x  logp2x  +  (2 qpx  +  p2(  1  -  x))  log  (2qrpx  +  p2(l  —  x)) 
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+  (i q2x  -f  2qp(l  -  x))  log  ( q2x  +  2qp(l  -  x))  +  q2(  1  -  x)  logq2(l  -  x) 
+2 [qp  -  h(p)] 


We  will  often  simply  write  the  mutual  information  as  I  instead  of  I(A,E)  = 
J(2£,  A).  Let  us  fix  the  value  of  p  at  some  boundary  values  and  see  what  happens 
to  the  mutual  information. 


J|p=o  —  |p=i  —  h(x)  . 

Therefore,  since  capacity  is  the  maximum  over  x  of  J,  we  see  that  (viewing 
capacity  as  a  function  of  p): 


C(p  =  0)  =  C(p  =  1)  =  1 


Certainly  since  the  input  is  limited  to  the  two  symbols  0  and  0C,  capacity  is 
bounded  between  zero  and  one.  Let  us  consider  the  channel  diagrams  in  these 
two  special  cases. 

In  both  of  these  cases  we  have  a  noiseless  channel  on  two  symbols.  Therefore, 
the  capacity  is  max^  h(x)  which  is  simply  one.  The  more  interesting  cases,  when 
0  <  p  <  1,  we  solve  numerically  and  plot  the  results  in  Figure  9.  Of  course,  the 
capacity  is  symmetric  about  .5  because  of  the  inherent  symmetry  between  p  and 

?• 

Figure  10  depicts  on  one  plot  the  capacity  from  Case  2.1  (two  transmitters 
—  Clueless)  and  the  capacity  from  Case  2.2  (three  transmitters  —  Cluelessi, 
Clueless2). 

Except  for  the  boundary  values,  the  capacity  is  always  less  for  a  given  p 
with  three  transmitters  than  with  two.  This  is  not  surprising,  the  extra  clueless 
transmitter  means  extra  noise.  Note  that  the  noisiest  case  is  when  p  =  .5;  in 
this  case  the  channel  diagram  is  given  in  Figure  11.  In  this  case,  the  capacity  is 
achieved  when  P{A  =  0)  =  x  =  1/2,  and  the  capacity  is  ~  .3113  (this  can  be 
argued  through  symmetry;  we  make  it  precise  below  in  the  general  case). 

Unfortunately  we  cannot  derive  closed  form  solutions  even  for  these  simple 
cases.  Therefore,  it  seems  unlikely  that  we  can  derive  a  closed  form  for  the  general 
case  of  N  clueless  transmitters  in  addition  to  Alice.  Of  course,  we  could  still 
derive  the  capacity  numerically.  However,  we  are  able  to  obtain  some  bounding 
results. 


2.3  Case  2.3 — Alice  and  N  additional  transmitters 

Case  2.3  is  the  general  form  of  Scenario  2.  Now9  we  imagine  that  there  are  N  -f  1 
transmitters,  Alice  is  one  of  them,  and  the  other  N  are  all  independently  identical 

9  One  could  relax  the  assumption  that  all  the  Cluelessi  have  identical  and  independent 
behavior. 
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(a)  V  =  0 


0 


0 


Oc 


1 


2 


3 

(b)  p  =  1 

Fig.  8.  Special  cases  for  the  channel  diagram  for  Case  2.2. 


clueless  transmitters.  That  is,  there  are  transmitters  Cluelessx,  Clueless2, 
Clueless^.  Again,  Eve  can  only  see  how  many  messages  are  leaving  the  first 
MIX-firewall  headed  for  the  second  MIX-firewall.  Therefore  Eve  can  determine 
if  there  are  0, 1, . . . ,7V  -|-  1  messages  leaving  the  firewall.  That  is  all  Eve  can 
determine.  Therefore,  there  are  still  the  two  input  symbols  a0  =  0  and  a:  =  0C, 
but  we  have  TV  -h  2  output  symbols.  The  probability  that  Clueless*  does  not  send 
a  message  is  still  p,  and  that  it  does  send  a  message  is  q  =  1  —  p.  Now,  calculate 
the  channel  matrix. 

Alice  sends  a  0. 

—  For  Eve  to  receive  e*  (that  isi?  =  fc),0<A:<7V  we  need  k  of  the  clueless 
transmitters  to  send  a  message,  and  TV  -  k  not  to  send  a  message.  Therefore, 

p{ek\A  =  0)  =  (^ SjpN~kqk ,  0<k<N. 

~  p(eN+i \A  =  0)  =  0,  since  the  event  never  happens  because  Alice  is  not 
transmitting. 
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Fig.  9.  Capacity  as  a  function  of  p  for  three  transmitters. 


Alice  sends  a  0C. 

—  p{eo\A  =  0C)  =  0,  since  the  event  never  happens,  because  Alice  is  transmit¬ 
ting  so  Eve  must  observe  at  least  one  message. 

—  For  Eve  to  receive  e*  (that  is  E  =  k),  1  <  k  <  N  +  1  we  need  k  —  1  of  the 
clueless  transmitters  to  send  a  message,  and  N—k+1  not  to  send  a  message. 
Therefore, 


p(ek\A  =  0e)  =  (kN  ^pN-k+1qk~\  l<k<N  +  l. 

Since  p{ek )  =  p{ek\A  =  0)P(A  =  0)  +p(ek\A  =  0c)P(A  =  0C),  we  have  that 


p(e0)  =  xpN  , 

p(ek)  =  *(*)*>"- V  +  (1  -  *)  ,  1  <  k  <  N,  and 

p(eN+1)  =  (1  -  x)qN  . 
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Fig.  10.  Capacity  as  a  function  of  p  for  both  two  and  three  transmitters. 


So  the  entropy  of  E  is 


{n  r 

xpN log xpN  +  ^  x(^k)pN  kqk  +  (!  -  x) J ^jpN~ 


k+iqk 


-fc+y-i  +(i_x)gwIog(1_x), 


Fig.  H.  Channel  diagram  for  noisiest  situation  for  Case  2.2. 
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(a)  Channel  transition  diagram 
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(b)  Channel  matrix 


0  f  y  y  V2 
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Fig.  12.  Channel  for  Case  2.3,  the  general  case  of  N  clueless  users. 

The  conditional  entropy  is  a  little  easier  to  deal  with. 

h(e\a) =-(*{£  [(?y-v]  log  [(?y-v]  | 

[G->',-,+v'Il iog  [G- >"_,+v“] }) 


mm = -t  [(T)^“v] log  [(")^-v 


(3) 


Observe  that  H(E\A)  is  independent  of  x .  Therefore,  to  maximize  the  mutual 
information  we  only  need  to  maximize  H  (E). 

The  mutual  information  is 


I(E,A)  =  -^xpN 


log  xp 


N 
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+E  [*(*)p"-v + a  -  1y~k+Y-' 

l0g  \X{T) pN ~kqk  +  "  *)  (jk  -  i)pN_*+19*_1 

+(1  ~x)qN  log(l -1)^1 

+S[(”)""'v]log[(T)pA'"v]  <4» 

For  Case  2.1  (one  Clueless  in  addition  to  Alice)  and  for  Case  2.2  (two  clueless 
in  addition  to  Alice)  we  discussed  the  symmetry  about  p  =  .5  informally.  Case 
2.3  includes  Cases  2.1  and  2.2  as  special  cases,  and  we  prove  this  symmetry  exists 
for  the  general  case. 

Theorem  1  I(E,  A)\XiP  =  I(E,  A)\x.Xtq 

PROOF:  By  inspecting  Eq.  4  we  see  that  the  last  term  (~H(E\A))  is  indepen¬ 
dent  of  x,  so  we  can  ignore  it.  The  terms  xpN  log  xpN  and  (1  -  x)qN  log(l  -  x)qN 
are  interchanged  when  x  and  p  are  interchanged  with  1  -  x  and  q,  respec¬ 
tively.  This  leaves  the  complicated  term  in  the  middle  of  Eq.  4.  We  define 

Aj(x,p)  =  [«(5V-V  +  (1  -  x) (j^i)pN-j+1<7j-1]  ,  therefore  the  middle  term 

is  just  ^2  Ak (x, p)  log  Ak (x, p) .  We  consider  the  complementary  j  and  N-j  +  1 
*=1 

indices.  Note,  AN.j+1(x,p)  =  +  (1  -  *)(^)p#^-j]. 

(There  are  always  such  complementary  terms  except  for  when  N  is  odd  and  j  is 
the  “middle”  index  \N/2].  We  will  return  to  this  special  case.) 

Consider  Aj(x,p)  and  the  complementary  AN-Hl(x,p).  Using  the  identity 
( k )  =  Cv-*)  ^  trivially  follows  that  Aj{xyp)  =  .4jv_j+i(l  —  zyq)  .  Therefore, 
since  N  -  (N  —  j  +  1)  +  1  =  j  we  see  that 

AJ  (x>  P )  log  Aj  (x,  p)  +  AN„j+ !  (x ,  p)  log  As-j+i  (z,  p) 

-  Ajsr-j+i(  1  -  zy  q)  log  -A/v-j-f  i  (1  -  x,q)  +  Aj(l  -  x,q)\ogAj(l  -  xyq)  . 

Now  let  us  look  at  the  special  case  where  N  is  odd  and  we  are  consider- 
lofi  a\n/2]  {ZyP)>  which  does  not  have  a  complementary  term,  since  \N/ 2]  = 
N  -  \N/2]  +  1.  However,  it  trivially  follows  that  N  -  \N/2]  =  \N/2]  -  1,  and 
hence  we  also  trivially  have  that  (f^2l)  =  (w*|-i)*  Therefore,  by  substitution 
one  sees  that  AfN/2](xyp)  =  AfN/2](l  -  xyq).  □ 

We  will  need  the  following  in  the  rest  of  the  paper  so  we  will  consider 
I(E,A) |p=.5  =  H(E)p=z  5  ~  H(E\A)p=b  now. 
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Consider  the  entropy  of  E  evaluated  when  p  — 


_  1 


=  -{*(!)  <°S*Q) 

K^)  (5)'+(i— »(»-.)  (s) 

“4©  G)  +<i_i,G-i)  (5) 


N1 


/ 

i\N  / 

i\N) 

+(i  -  *)  (j 

2  J  ]og(l  -  x)  ^ 

0} 

(5) 


Consider  the  conditional  entropy  when  p  = 

[mar  ©wo  or 

--slcxa'M©©! 

=-(»'|gC)“CT-'|0) 

-ffl'lSffl-ffl-””} 

-*-e)'s©-ra 

Note  that  H(E\A)\p=.s  is  independent  of  x.  Keep  in  mind  that  we  may  ex¬ 
press  the  mutual  information  evaluated  at  (x',p')  by  the  slightly  overloaded 
notation  I(E,A)  \x=xl>p=p>.  Of  course  I(E,A)  |p=p-  is  simply  a  function  of  x,  and 
I(E,  A)\X=X'  is  a  function  of  p. 

Definition  1  We  say  that  an  arbitrary  (real  valued)  function  is  not  locally- 
constant  iff  for  all  x  with  f(x)  defined  at  x,  and  for  every  5  >  0,  there  exists  an 
x'  such  that  d(x',x)  <  6  (i.e.,  x'  in  the  neighborhood  of  x)  with  f(x')  f{x). 

That  is,  for  no  neighborhood,  no  matter  how  small,  is  the  function  constant. 

Definition  2  We  say  that  a  function  f  :  [0, 1]  -»  91  is  symmetric  about  x  =  .5, 

iff  f{%)  =  /(!  -*)• 


H(E\A)  |p=.5  =  -E 
1=0 
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Observation  3  If  f(x)  is  symmetric  about  x  =  .5  and  it  is  concave  down  (con¬ 
vex  up)  then  /(. 5)  is  a  maximum  (minimum)  value .  Further ,  if  f(x)  is  not 
locally- constant  then  .5  is  the  only  such  critical  point . 

Theorem  2  J(J5,  A)|p=.5  is  symmetric  about  x  =  .5. 

PROOF:  By  Thm.  1  we  see  that  /(£,  A)|X|.5  =  I(E,A)\i-x,*. 

□ 

Theorem  3  C(. 5)  =  I(E9A) |„.5lP=.5. 

PROOF:  By  Theorem  2,  we  know  that  I(E,A)\p=b  is  symmetric  about  x  = 
.5,  and  [7][Thm.  4.4.2]&[5][Thm.2.7.4]  show  that  I(E,A)\P= .5  (and  in  general 
I(E,  A)  for  fixed  p)  is  concave  down.  Therefore,  from  Observation  1, 1(E,  A) |p_  5 
obtains  its  maximum  value  when  x  =  .5.  Since  capacity,  when  p  =  .5,  is  the 
maximum  of  /(£,  A)|p- 5,  we  are  done. 

□ 

Theorem  4  C(p)  >  I(E,A)\„. 5^.5. 

PROOF:  By  definition  C(p)  >  J(i?,  A)  |x~5,  since  capacity  is  the  maximum  of 
the  mutual  information.  For  x  fixed  I(E,A) \x  is  a  convex  up  function  of  p  (see 
[7][Thm.  4.4.2]&[5][Thm.2.7.4]).  By  Thm.  1  we  see  that  I(E,A)\X=Z^  is  symmet¬ 
ric  about  p  =  .5.  By  Observation  3  we  see  that  I{E,A)\x=.b  >  /(£,  A)|x=.5,p=.5. 


This  allows  us  to  use  the  simple  single  value  I(£,  A)|x=. 5>p- 5  as  a  lower 
bound  for  the  covert  channel  capacity. 

Corollary  1  C(p)  >  (7(.5) 

PROOF:  Apply  Theorems  3  and  4  together. 

□. 


Theorem  5  C(p)  —  C(1  —  p)  and  if  xp  is  the  unique  x  such  that  C(p )  = 
I{Ey  Al)\ xp,p?  then  X\—P  =  1  xp. 

PROOF:  This  trivially  follows  from  Thm.  1  and  the  uniqueness  (follows  from 
the  concavity  properties  and  the  fact  that  the  mutual  informaiton  is  not-locally 
constant — this  follows  by  inspection  of  I(E,A))  of  the  critical  x  value. 

□ 

Let  us  now  use  these  results  to  bound  capacity  from  below.  We  now  con¬ 
sider  the  formula  for  mutual  information  when  x  =  p  —  .5.  Thus,  we  study 
I{E,  A)|x- 5)P=>5  as  N  varies.  Let  us  first  calculate  tf(E|A)|x- 5>p=.5,  since  H(E\A) 
is  independent  of  x: 


mmi.**,* = = if-  (0"e(7)  '“s  (f) 
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From  Eq.  5  we  know  H(E) |p=.5- 

In  what  follows  we  use  the  identity  (^)  +  =  (iV^1),  and  the  fact  that 

=  (1  +  l)w+1  -  1  -  1  =  2n+1  -  2  . 


So 


-(*  +  !)(§)" 


(g+i) 

2^+i 


(2Ar+1  -  2)  + 


(Ocrbcn) 


Since  I(E,  A)\x-  ^  p=  ^  —  H(E)\x=  $iP=z  §  —  H(E\A)\X-^}P~  we  see  that: 

= * + ■  -  (CT  £(** ')  *  {N:  0 


-{'-tt)'S(7)-(7)} 

Therefore, 

<«•— (»'{£(?)•«  (T)  -  is(”:  ■) -(".“)}■ 

Since  (Ar^*1)  log  (iV^”1)  =  0,  we  can  simplify  this  to: 


m 

C(.  5) 

N 

C(.  5) 

a 

0.500000 

IEB 

B 

"0.311278 

m 

0.049873 

B 

0.219361 

IEE 

B 

0.167553 

m 

0.043799 

B 

0.135170 

IEB 

mmi 

B 

0.113278 

\m 

0.039048 

B 

0.097558 

ES 

0.037039 

B 

0.085730 
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0.035228 

B 

0.076502 

m 

0.033586 

m 

te.069092 

& 

■a 

m 

0.063007 

0.030722 

m 

0.057917 

u 

■■Bl 

0.028309 

C(. 5)  =  lower  capacity  bounds  for  all  p,  N  =  1....25 
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Of  course  there  are  further  relationships  that  can  be  exploited  but  they  do 
not  seem  to  assist  in  the  analysis,  but  rather  seem  to  obfuscate  the  symmetries. 

The  above  table  shows  the  results  of  numerical  calculations  of  C(.5)  to  six 
decimal  places. 

Note  that  in  the  general  circumstances  of  Case  2.3,  if  p  =  0  (or  similarly 
q  =  0),  we  have  a  noiseless  channel  and  the  capacity  is  one,  which  is  achieved 
when  x  =  .5.  So  we  see  that  1  is  a  tight  upper  bound  for  the  capacity.  Therefore 
we  have  the  following  result: 

For  Alice  and  N  (N  >  0)  transmitters:  C(. 5)  <  C(p)  <  1  and  these  bounds  are  tight. 

Of  course  keep  in  mind  the  result  from  Case  2.0: 

For  Alice  and  no  additional  transmitters:  Capacity  =  1. 

Therefore  the  region  between  the  two  plots  for  the  N  values  represent  the  re¬ 
gion  where  the  capacity  falls,  depending  on  the  behavior  of  the  other,  clueless 
transmitters  (and  Alice’s  knowledge  of  and  long-term  adaptation  to  them).  Fur¬ 
thermore,  the  entire  region  is  spanned  by  different  choices  of  p  (we  ignore  p  for 
the  degenerate  case  of  N  =  0).  See  Figure  13. 


As  TV  grows  so  does  the  noise.  Therefore,  we  see  that  the  capacity  is  non¬ 
increasing.  We  are  interested  in  the  lower  bound  C(. 5).  We  have  numerically 
calculated  C(. 5)  to  TV  =  7750  and  have  shown  that  C(. 5)  is  monotonically  de¬ 
creasing  to  zero  (for  N=7750,  C(. 5)  =  .000093).  We  can  (but  do  not  since  it  is 
many  pages  in  length)  analytically  show  C(. 5)  is  monotonic  decreasing.  That  is 
not  surprising  since  increasing  the  number  of  clueless  users  increases  the  noise, 
but  it  is  surprising  that  it  is  so  difficult  to  show  that  C(. 5)  goes  to  zero  as  TV 
goes  to  infinity.  Below  we  discuss  that  fact  in  more  detail. 

From  Eq.  (6)  we  can  express  C(. 5)  as 


C(.  5)  =  1 


G)’ 


S(N), 


where 


swAs{KND'-rr)-(") '-a)}- 


First  we  will  simplify  S(N). 
Theorem  6  5(TV)  =  2N  log{N  -f  1)  - 


PROOF:  Define 


By  expanding  log  (£)  as  log  TV!  -  log(TV  -  A:)!  -  log  k\,  and  using  =  2N, 

N  AT  t“'" 

(1 ) =  (iv-t)> 411(1  £/(*) =  -  *)> we  have  that 


*=0 


Therefore, 


a(TV)  =  2nlogTV!  —  2 


log  A!. 


a(TV  +  1)  =  2n+1  log(TV  +  1)1  -  2^  ^  log*!. 

Since  1  log  1  =  0,  we  have  that 
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=  ±a(N  +  1)  -  a(N) 


=  27Vlog {N  +  1)  +  2^  log fc!  -  ]T)  *)  lo&kl 


Recalling  the  identity  (JV^_1)  =  (*)  +  and  using  the  fact  that  (^)  =  0,  if 

k  <  0  or  *  >  N,  we  further  simplify  the  above  to: 

N+1  /M\  £+1  /  N  \ 

S(N)  =  2Nlog(N  +  l)+J2  (  Jlog*!-^  U_i)loSfc! 

k=o  '  '  k= 0  '  7 

*  /v\  *±f  /  N  \ 

=  2n log(iV  +  !)  +  £(,)  log*!  -  £  (fc  _  1)  loSfc! 

Jfc=0  ^  '  Jfc=0  v  7 


f  log*! 


£© 


log(*  + 1)! 


after  re-indexing  the  second  sum 

=  2n  log  (AT  +  1)  +  51  (J.'j 

since  log(fc  +  1)!  =  log(&  4- 1)  +  log  A:! 

S(N)  =  2n  log (N  +  1)  -  J2  (^)  los(*  +  1)  n 

Keep  in  mind  our  goal  is  to  study  the  behavior  of  C7(.5)  as  TV  -4  oo.  However, 
first  we  need  a  technical  lemma. 

monic 


Lemma  1  £  (^)kp  =  2 n~pQp(N),  for  p  <  N,  where  QP(N )  is  a 

k= 1  '  ' 

polynomial  in  TV  of  degree  p . 

PROOF:  In  [17,  Formulas  1,2,7,8,9,10  p.  608]  or  [19,  Formula  34  p.  85]  it  is 
shown  that 

£  (2)*  -  2"i>  + O'- 

The  term  2N~P  p\  is  simply  2N~P  multiplied  by  TV  *  (TV  —  1)  •  *  *  (TV  —  p  4- 1), 

which  is  simply  2N~P  times  a  monic  polynomial  in  TV  of  degree  p.  The  other 

term,  2  y^(-l)*  f  . j  ^(-1)J  f . Jip,  is  polynomial  in  TV  of  degree  less  than 

i= 1  \l  /  1  j= 1  \3J 


p- 

□ 


We  are  now  ready  for  the  major  result. 
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Theorem  7  lim  C(. 5)  =  0  . 

N~~>oo 

PROOF:  We  will  prove  the  result  by  showing  that  (±)N  S{N)  approaches  one, 
since  C(.5)  =  1  -  (±) N  S{N),  this  suffices. 

Our  first  step  is  to  use  natural  logarithms  instead  of  base  two  logarithms. 

sm  =  {2"i»(w + 1)  _  £  Hk + ,)| . 

Consider 


U”)Hk+i)=U»N->)HN-k+i) 

=  £(*)ln(w-*+i) 


=  S©'"(1  +  ">(-T h) 

~4(>1+N)+U>(l-rh) 

=  2«ln(»  +  l,  +  g(")h(l-ri_). 


OO  /  71  \ 

Now  we  use  the  Maclaurin  series  of  ln(l  -  x)  =  ^  -  [  —  j ,  which  is  valid 

*  .  .  n=l  \  H  / 

for  |x|  <  1, 


jfcp 

p(l  +  N)p 


(In  what  follows  we  do  not  give  an  epsilon-delta  style  proof.  Rather  we  ignore 
uniform  convergence  issues  and  freely  pass  terms  in  and  out  of  the  sums.  This 
is  done  in  the  interest  of  space  and  intuition.) 


fl-i  \y  (wU.f  1_J_ 

^P(i  +  N)p\^o\k)K  \-2-,p(i  +  i 


(Now  we  use  Lemma  1) 

~2ny^  (il!  [  Qp(N)  1 
p  i(^  +  i)p. 

We  know  can  write  (%)N  S(N)  as 
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Since  QP(N) 
Therefore, 


is  a  monic  polynomial  in  N  of  degree  p,  lim 


Qp(N) 


N^oo  [(jV  +  l)J>J 


=  1. 


lim 

TV— >  OO 


S(N)  = 


In  2 


-1 
In  2 


ln(l-i)  =  l. 


Since  C(.5)  =  1  -  (|)N  S(N),  we  are  done. 

□ 


2.4  Continuity 

For  Scenario  2  we  wished  to  say  that  capacity  was  a  continuous  function  of  p. 
We  thought  that  we  could  just  use  some  standard  information-theoretic  result. 
Unfortunately,  we  could  not  find  such  a  result.  We  do  not  think  that  it  would  be 
too  hard  to  argue  from  the  various  concavity  properties  of  mutual  information 
that  C(p)  is  a  continuous  function  (of  p).  However,  we  decided  to  present  a  more 
general  result. 

Theorem  8  Let  F(x,p)  be  a  continuous 10  function  defined  on  [0, 1]  x  U ,  U  an 
arbitrary  subset  of  the  reals 7  and  assume  that  for  each  fixed  p,  F(xfp)  achieves 
a  maximum  denoted  as  F(p).  Then  r(p)  is  a  continuous  function  of  p. 

PROOF:  If  T(p)  is  not  continuous,  then  3  a  point  of  discontinuity  po-  This 
means  that  there  is  an  e  >  0  such  that  for  any  S  >  0,  3  ap$  such  that  \p$  -po|  <  S 
but  \r(p6)-r(p0)\>e. 

There  is  some  xo  such  that  F(po)  =  F(x0,Po)  =  maxx  F(x,po,)  (there  may 
be  more  than  one  such  “maximizing”  x). 

Keep  in  mind  though  that  F(x,p)  is  a  continuous  function.  This  means  that 
for  every  (t,p0),£  e  [0,1],  3  a  6t  >  0  such  that11  d{(x,p),  (t,p0)}  <  St  =* 
|F(x,p)  -  F(t,p0)|  <  e.  The  set  |(x,p)  |  d{{x,p)>  (*,Po)}  <  is  called  a  6t- 

neighborhood  of  (t,po)*  Every  ^-neighborhood  of  (f,po)  can  be  replaced  with  an 
open  square  box  centered  about  (t,po)  with  side  length  <5*,  we  call  this  a  St- box 
neighborhood  of  (t,p0).  This  St- box  neighborhood  of  (t,p0)  is  a  proper  subset 
of  the  ^-neighborhood  of  (t,po)-12  Since  [0,1]  x  p0  is  a  compact  set  (closed  and 

bounded)  and  the  j<$i-box  neighborhood  of  (t,Po)  |  t  €  [0, 1]  J  is  a  collection  of 

10  In  this  paper  all  functions  are  real  valued. 

11  d  is  the  standard  Euclidean  metric  in  the  plane. 

12  Keep  in  mind  that  when  we  form  any  sort  of  neighborhood  we  must  intersect  it  with 
{[0, 1]  x  U},  therefore  our  &-balIs  or  <$f-boxes  might  not  be  actual  balls  or  boxes. 
They  might  have  gaps  in  them  and  not  extend  symmetrically  on  both  sides  of  po- 
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open  sets  that  cover  [0, 1]  x  po,  it  can  be  replaced  by  a  finite  subcollection  that 
also  acts  as  a  cover.  Recall  the  point  x0,  we  require  that  the  <5IO-box  neighbor¬ 
hood  of  (x0,po)  be  in  this  finite  subcollection  (if  it  is  not,  just  add  it  in).  Define 

a  set  T  —  {  tl  |  i  G  {1  ,...,N},tl  G  [0,1],  and  xo  is  one  of  the  V  }  such  that 
U  j^-box  neighborhood  of  (t’,po)l  covers  [0, 1]  x  po.  For  simplicity  we  refer 

t'€T  J 

to  the  union  of  these  sets  as  FC.  Let  d  =  |  ■  min{t'}  (this  is  where  we  use  finite¬ 
ness).  Note  that  if  (x',p)  G  FC,  then  3F  E  T  such  that  \F(x',p)-F(P,p0)\  <  e. 

Since  T(p)  is  not  continuous  at  po  we  know  that  there  is  a  pc  such  that 
l P<  —  Pol  <  d  but  |F(p^)  —  jT(po)|  >  f •  We  know  that  there  is  some  x^  such 
that  F(p ()  =  F(xoP<)  =  max*  F(x,p<;)  (there  may  be  more  than  one  such 
“maximizing”  x).  We  have  two  cases  to  consider: 

i-  r(p()  >  r(po):  So  r(po)  <  r{p()  - 1 

Since  d  was  chosen  to  be  minimal  by  construction  (x^.p^)  €  FC.  So  for  some 
V  we  have  that  |F(x(,p{)  -  F(tj,po)\  <  t ,  which  is  the  same  as 
|F(tJ,po)  -T(p()|  <  e.  So  r(pc)-e  <  F(V ,po),  therefore  T(p0)  <  F(tj,po), 
which  is  impossible  since  T(po)  cannot  be  less  than  F(x,po)  for  any  x. 

2.  r(pc)  <  r(po):  So  r(p()  <  r(p0)  - 1 

Recall  that  we  constructed  FC  so  that  it  would  contain  the  (5*0-box  neigh¬ 
borhood  of  (xo,po)-  Therefore,  since  |p^  —  po|  <  d,  and  d  was  chosen  min¬ 
imal,  we  have  that  (x0,p<)  G  <5IO-box  neighborhood  of  (x0,po).  Therefore, 
|F(x0,p^)  -  F(x0,po)|  <  e,  which  is  the  same  as  |F(x0,Pf)  -  F(po)|  <  e. 
So  r(po)  -  e  <  F(x0,p(,),  therefore  F(pt)  <  F(xo,P(),  which  is  impossible 
since  F(p^)  cannot  be  less  than  F(x,p()  for  any  x. 

Hence  we  have  a  contradiction,  so  F(p)  must  be  continuous. 

□ 

We  note  that  we  used  boxes  instead  of  circles  because  it  was  easier  to  con¬ 
struct  a  distance  d  so  that  all  points  would  be  guaranteed  to  be  in  FC. 

It  is  not  important  that  x  G  [0, 1];  what  is  important  is  that  [0, 1]  is  a  compact 
set.  Note  that  if  D  is  not  a  compact  set  there  are  counter-examples. 

Corollary  2  Let  F(x,p)  be  a  continuous  function ,  where  p  €  U  and  x  €  D 
where  D  is  a  closed  and  bounded  subset  of  the  real  line.  Assume  that  for  each 
fixed  p,  F(x,p)  achieves  a  maximum  denoted  as  r(p).  Then  r(p)  is  continuous 
in  p. 


This  is  a  technical  point  that  we  will  not  labor  upon  further.  It  does  not  affect  the 
proof.  What  is  important  is  that  they  are  open  sets.  We  have  also  used  the  fact  that 
in  a  circle  of  radius  r  the  largest  box  that  can  be  inscribed  (it  is  also  centered  about 
the  center  of  the  circle)  has  side  length  y/2  r,  we  use  a  smaller  box.  Also  when  we 
construct  d  later  we  use  ^  of  a  value,  that  is  done  since  we  are  only  looking  at  one 
side  of  a  box. 
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Since,  for  Scenario  2,  we  see  by  inspection  that  the  mutual  information  is  a 
continuous  function  of  (x,p),  and  x  £  [0, 1],  we  have  the  following  result. 

Theorem  9  For  Scenario  2,  C(p)  is  a  continuous  function. 

We  believe  that  continuity  results  such  as  this  are  important,  but  they  seem 
to  be  overlooked  in  the  literature 

3  Comments,  Generalizations  &  Future  Work 

3.1  Comments 

We  first  note  that  despite  the  obfuscation  provided  by  MIX-firewalls,  and  the 
attendant  noise  introduced  by  other  transmitters,  Alice  is  still  able  to  transmit 
information  to  Eve.  At  this  point,  we  recall  our  earlier  observations  and  add  to 
them  below. 

1.  In  conditions  of  very  little  extra  traffic,  or  very  high  extra  traffic,  the  covert 
channel  from  Alice  to  Eve  has  higher  bit  rates. 

2.  The  capacity  C(p),  as  a  function  of  p  is  strictly  bounded  below  by  C(.5), 
and  C(.5)  is  achieved  when  the  mutual  information  is  evaluated  at  x  =  .5 
(of  course  p  =  .5  also  in  this  situation). 

3.  The  capacity  C(p),  as  a  function  of  p  is  strictly  bounded  below  by  a  function 
that  decreases  monotonically  to  zero  as  the  number  of  transmitters  increases, 
but  is  never  zero. 

4.  The  bias  in  the  code  used  by  Alice  to  achieve  the  optimum  data  rate  on 
the  channel  is  not  always  x  =  0.5,  but  it  is  never  far  from  0.5,  and  our 
preliminary  experimental  results  indicate  that  the  difference  in  capacity  is 
minor. 

The  last  observation  agrees  with  [9],  which  presents  the  general  result  that  in 
DMCs,  capacity  obtained  by  using  x  =  .5  is  no  less  than  94.21%  of  the  optimum 
channel  capacity.  Even  if  Alice  has  no  knowledge  of  the  probabilistic  behavior  of 
the  other  transmitters,  her  data  rate  will  not  be  too  far  from  optimal  if  she  uses 
an  unbiased  code.  (Note,  however,  that  the  coding  rate  is  very  much  dependent 
on  knowledge  of  the  number  of  other  transmitters  and  their  behavior.) 

3.2  Future  Work 

Following  up  the  last  observation  from  the  preceding  subsection,  we  note  that  it 
does  not  hurt  Alice  too  much  if  she  does  not  use  the  optimum  bias  in  her  code 
(i.e.,  she  does  not  know  much  about  p).  However,  the  choice  of  code  will  depend 
greatly  on  the  channel  capacity  among  other  characteristics.  It  appears  that  at 
less  noisy  conditions  (p  near  0  or  p  near  1),  the  load,  L(N,p)  =  pN ,  in  expected 
packets  per  tick  sent  by  the  other  transmitters,  dominates  in  determining  the 
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capacity.  That  is,  for  small  p  or  1 — p,  defining  In(E,  A)  as  the  mutual  information 
with  N  other  transmitters, 

lN{E,A)\kp  ~  IkN{E,A)\p  . 

For  intermediate  values  of  p,  (i.e.,  p  near  0.5),  the  capacity  is  mostly  influenced 
by  AT,  the  number  of  transmitters.  As  N  increases,  experimental  results  show 
that  the  curves  of  C(p)  versus  p  become  increasingly  “flat-bottomed,”  hence 
are  less  sensitive  to  p  for  the  intermediate  values  of  p.  So  for  Alice,  knowing 
N  is  crucial  unless  the  loads  are  rather  low,  in  which  case  the  load  is  the  most 
important  factor. 

For  Scenario  2  we  assume  that  every  Clueless,  was  given  by  the  same  proba¬ 
bility  distribution.  The  probability  p  measured  Clueless*  not  sending  a  message. 
One  can  generalize  Scenario  2  to  allow  these  probabilities  to  vary.  That  is  we 
can  assign  the  probability  p*  to  Clueless*  not  sending  a  message.  Of  course  this 
changes  the  analysis  that  we  have  given  above.  We  conjecture  that  the  observa¬ 
tions  regarding  the  load  and  number  of  transmitters  remains  true  as  long  as  the 
Pi’s  are  not  too  different.  The  case  of  varying  probabilities  will  be  taken  up  in 
future  work.  However,  we  feel  that  our  simplistic  assumptions  serve  to  show  the 
difficulty  of  the  analysis  and  to  show  some  general  trends.  Furthermore,  we  feel 
that  our  assumptions  are  a  good  gross  model  of  system  behavior. 

In  future  work  we  will  also  analyze  the  situation  where  we  have  only  an  exit 
point  MIX-firewall  as  shown  below. 


Alice,  Clueless* 


MIX-firewall 


Eve 


Receivers 


Fig.  14.  Scenario  with  exit  point  MIX-firewall  only. 


We  have  M  receivers  denoted  i?i, . . . ,  Rm *  Eve  still  does  not  know  directly 
who  sent  a  message,  but  Eve  does  know  where  messages  are  going.  This  increase 
the  capacity  of  the  covert  channel.  Alice  now  instead  of  just  sending  0  or  0C  can 
send:  0  (not  transmitting);  1  (message  to  the  first  receiver),  ...  ,  i  (message  to 
the  tth  receiver, ...  ,  M  (message  to  the  Mth  receiver).  The  greatest  the  capacity 
can  be  is  lo g(M  +  1).  Of  course  if  M  =  1  the  situation  reduces  to  Scenario  2. 

As  before  the  Clueless*  are  assumed  independent  and  one  may  allow  their 
distributions  to  be  identical  or  they  may  vary. 

Related  to  this  is  an  intermediate  question  of  the  nature  and  capacity  of 
covert  channels  in  a  network  of  MIXes  (with  Eve  as  GPA  or  Eve  as  an  RPA  re¬ 
stricted  to  observing  the  traffic  between  MIX-firewalls).  Now  there  are  Clueless*  /s 
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Ri 


Fig.  15.  Exit  firewall  only 


at  every  ingress  MIX  and  Receivert)/s  at  every  egress  MIX,  again  with  a  variety 
of  possible  characteristics. 

Other  areas  begging  for  farther  investigation  include  scenarios  in  which  there 
is  limited  network  capacity  (on  links  or  aggregate),  and  whether  or  not  there  is 
anonymity.  We  are  currently  investigating  this  using  the  model  in  which  at  most 
B  messages  can  be  sent  through  the  network  (as  output  from  a  sender  of  as 
output  of  a  MIX-firewall)  in  a  given  tick,  and  if  there  are  more  than  B  messages 
awaiting  transmission,  B  of  them  are  chosen  at  random  for  delivery.  This  may 
relate  the  work  to  more  sophisticated  MIX  models,  such  as  pool  MIXes,  which 
is  also  desirable. 

A  deeper  issue  raised  in  this  preliminary  paper  is  that  of  the  relationship 
between  anonymity  and  covert  channel  capacity  (fixing  the  other  factors  that 
affect  capacity).  It  seems  evident  that  as  system  level  anonymity  increases  in 
the  simple  models  shown  here  (i.e.,  the  number  of  potential  senders  increases), 
the  minimum  capacity  decreases  to  zero.  However,  as  the  probability  that  a 
clueless  sender  transmits  in  a  given  tick  increases,  the  expected  number  of  actual 
senders  in  a  given  time  tick  also  increases,  hence  the  anonymity  increases,  but 
the  capacity  of  the  covert  channel  increases  once  this  probability  exceeds  0.5. 
The  relationships  are  not  simple,  but  their  discovery  has  the  potential  to  increase 
our  understanding  of  fundamental  aspects  of  network  design. 
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